DOCUMENT RESUME 



ED 044 957 



48 



FI 001 967 



AUTHOR 

'T'T'pTjF 

INSTITUTION 
SPONS AGENCY 



BUREAU NO 
PUB DATE 
CONTRACT 
NOTE 

AVAILABLE FROM 



Paoue+te, E. Andre; Tollinger, Suzanne 
A Handbook on foreign Tanguaae Classroom Testina: 
French; German, Italian, Pussian, Spanish. 

Modern Languaae Association of America, New York, 

N. Y. 

Office of Education (DREW) , Washington, D.C. Rureau 
of Research. 

ER-6-261? 

J u n 6 8 

OEC-1 -6-0 6261°- 1876 

23U P . 

MI® Materials Center, 62 Fifth Avenue, New York, 
N.Y. 10011 ($6.00) 



RDRS PRICE EDRS Price MF-S1.00 EC-S11.80 

D^SCPIRTORS Achievement Tests, Annotated Ribliographles, French, 

German, Instructional Program Divisions, Italian, 
Item Analvsis, ^Language Instruction, ^Language 
Tests, ^Modern languages, *Objective Tests, Russian, 
Spanish, Statistical Analysis, Teaching Guides, 
^Testing, Test Interpretation 



ABSTRACT 



This handbook illustrates how the classroom teacher 
can make efficient use of tests, discusses the main purposes tests 
can serve, treats principal kinds of testing devices especially 
useful in language instruction, and includes a section on the 
interpretation of test results, chapters include: (1) the importance 

and place of testing in the foreign language program, (2) planning 
the classroom test, (3) construction of test items, (4) preparing 
test items in ^rench, German, Italian, Russian, and Spanish, and ( c ) 
test interpretation. A glossary of technical terms used in the 
handbook and an annotated bibliography on testing are included. (RT ) 




OEC-»-<.-o 6 2»6/7-/J?C 
Tm.c 3X ( Nbe^ ^-Kon 
TVtf J 



U.5. DEPARTMENT OF HEALTH. EDUCATION i WELFARE 
OFFICE OF IDUtillOR 



THIS DOCUMENT HAS BEEH REPRODUCED EXACTLY il RECEIVED FROM HIE 
PERSON OR ORGANIZATION ORIGINATING II. POINTS OF VIEW OR OPINIONS 
SHIED DO NOE HtCESSHRIIY REPRESENT OttICIU OIEICE OE EDUCMION 
POSIIION OR POEICY. 



I 

I 

1 

1 



LT\ 

O* 

-4* 

■4- 

O 

O 

UJ 



I 



i 

1 

I 

I 



A HANDBOOK ON FOREIGN LANGUAGE 
CLASSROOM TESTING: 

FRENCH, GERMAN, ITALIAN, RUSSIAN, SPANISH 



June 1968 



by 




MODERN LANGUAGE ASSOCIATION OF AMERICA 

F. Andr6 Paquette, Project Director 
Suzanne Tollinger, Research Assistant 



The research reported herein was performed pursuant to a contract with 
the Office of Education, United States Department of Health, Education, 
and Welfare, Contractors undertaking such projects under government 
sponsorship are encouraged to express freely their professional judgment 
in the conduct *£ the project. Points of view or opinions stated do not, 
therefore, necessarily represent official Office of Education position 
or policy. 



PREFACE 



This Handbook reflects the efforts of many people in testing and 
foreign language teaching. The participants in the conferences to 
organize the writing of the Handbook included foreign language teachers 



i 

i 



i 

i 

I 



at the elementary, high school, and college levels, foreign language 
methods teachers, and measurements specialists: 



Richard Bairrutia 
University of California 
at Irvine 

Frederick Bosco 

Catholic University of America 

Washington, D. C. 

Ludmilla Bradley 
Fullerton Junior College 
Fullerton, Calif. 

Nancy Capozzi 
Public School 41 R 
Staten Island, N. Y. 

Guillermo del Olmo 
Rutgers University 
New Brunswick, N. J. 

Robert J. Di Pietro 
Georgetown University 
Washington, D. C. 

Elisabeth Epting 
Converse College 
Spartanburg, S. C. 

Elizabeth A. Gorra 
Eastern Junior High School 
Greenwich, Conn. 

Mary Heiser 

University of Southern California 
at Los Angeles 



Carlene Horvath 

Parkland Joint School District 
Orefield, Pa. 

Albert JeKenta 

Beverly Hills Unified School 

District 

Beverly Hills, Calif. 

Tom Kelly 

Parkland High School 
Orefield, Pa. 

Robert L. Lathrop 
Pennsylvania State University 
University Park, Pa. 

Louise Lillard 

Beverly Hills High School 

Beverly Hills, Calif. 

Klaus A. Mueller 
University of California 
at Berkeley 

Josefina V. O'Keefe 
Bellevue School District 
Bellevue, Washington 

Elizabeth Ratte 
Purdue University 
Lafayette, Ind. 

Betty A. Robertson 
Northport High School 
Northport, N. Y. 



iii 



O 

ERJC 

hfliflaffBHaaaa 



Dan Romani 

Southern Illinois University 
Edwardsville , 111 . 



Lorraine Strasheim 
Indiana University 
Bloomington, Ind. 



Robert Stecklein . 

Bureau of Institutional Research 
University of Minnesota 
Minneapolis, Minn. 



Allan Taylor 
University of Colorado 
Boulder, Colo. 



The idea for this project emanated from the discussions of the 



Test Advisory Committee, under whose advisor ship the MLA Testing 



Program was conducted : 

James Alatis 
Georgetown University 
Washington, D. C. 

Don R. lodice 
Oakland University 
Rochester, Mich. 

Albert JeKenta 
Beverly Hills Unified 
School District 
Beverly Hills, Calif. 



Albert H. Marckwardt 
Princeton University 
Princeton, N. J. 

Sanford Newell 
Converse College 
Spartanburg, S. C. 

Sol Saporta 

University of Washington 
Seattle, Wash. 

Irving Wershow 
University of Florida 
Gainesville, Fla. 



We would like to thank the following for permission to use 
copyright materials: 



Modern Language Association of America, for the permission to 
use several definitions from What*s What: A List of Useful Terms for 
the Teacher of Modern Languages , compiled by Donald D. Walsh, 
Copyright MLA 1963, 1964, 1965. 



John E. Stecklein, Director, Bureau of Institutional Research, 
University of Minnesota, for permission to use excerpts from 
Bulletins on Classroom Testing , Numbers 2, 3, 4, 5, 6, and 7, Bureau 
of Institutional Research, University of Minnesota, Copyright 1954, 
1955, 1956, University of Minnesota. This material has been recombined 



iv 



1 



1 

to form Chapter III of this Handbook . A complete set of these booklets 
| may be obtained by writing to the Bureau of Institutional Research, 

j 3338 University Avenue, S. E., Minneapolis, Minn. 55415. 

*• At various times during the development of this Handbook , most of 

the sample test items were reviewed by the Foreign Language Test Staff 

i 

of Educational Testing Service, Princeton, N. J. 08540. 

i 

1 ' 

i. 

i 

i. 

i 

i 

s 

I 

I 

I 

l 

£ 

[ 




CONTENTS 



l 

l 

I 

! 

I 

f 



l 

i. 

I 

L 

[ 




Preface Page iii 

Introduction - ix 

Chapter I The Importance and Place of Testing 

in the Foreign-Language Program 1 

Chapter II Planning the Classroom Test 12 

Chapter III The Construction of Test Items 28 



Chapter IV Preparing Test Items 

French 40 

German 67 

Italian 89 

Russian 116 

Spanish 144 

Chapter V The Interpretation and Use of 

Test Results 173 

Glossary 208 



Bibliography 



211 



INTRODUCTION 



This H andbook has been produced to help the classroom teacher 
make efficient, use of tests . It contains discussions of the main 
purposes tests can serve. It treats the principal kinds of testing 
devices that seem to be especially useful for those purposes in the 
context of foreign-language teaching. And it has a section (Chapter 
V) on the interpretation of test results. 

Much of what is contained in this Handbook is useful at all 
levels of instruction and with people of all ages. Some testing 
devices, however, are obviously not appropriate to children in ele- 
mentary or even junior-high- school classes. The teacher should have 
no difficulty making a judicious choice. 

While most teachers have had some formal instruction in 
their own undergraduate education in the general principles of 
measurement, very few classroom teachers seem to have had any formal 
instruction in principles and methods of testing in the specific 
context of foreign-language teaching. Much of the general theoreti- 
cal discussion contained in this Handbook will therefore be more or 
less familiar to the reader. It is included here for several 
reasons: to make the Handbook a self-contained unit; to remind the 

reader of notions he may have forgotten; and to fill in gaps in the 
reader'. 3 background. The reader will notice that certain topics 
are treated in several different chapters. This repetition 
is deliberate. In each case, one treatment is much mere thorough 



than the others; these latter are intended to serve as useful 



contexts for discussion Of other topics. A Handbook is, after all, 
a reference work: the teacher should find at one place therein 
whatever he needs to know about the topic he has looked up . 

This Handbook is not complete. No book of this kind can 
be complete. It may suffice for many teachers. It may inspire 
others to exercise their imagination and ingenuity. It may lead 
still others to work through more technical expositions of some of 
the topics treated here. Thos who ha^e helped to prepare this Handbook 
hope it will be useful in some way to all of their colleagues. 



x 



O 

ERIC 

ijfflinaffamiaaa 



Chapter I 



THE IMPORTANCE AND PLACE OF TESTING 
IN THE FOREIG N- LANGUAGE PROGRAM 

1. 1 USES OF TESTS 

1. 1. 1 Knowledge and skills can he tested in many ways and for many 
reasons. By its very nature audio-lingual teaching in particular 
requires a constant flow of information and encouragement of the 
kinds supplied hy regular and varied testing. It is useful to con- 
sider tests as (l) measures of achievement, as (2) diagnostic instru- 
ments, and as (3) teaching tools. The preparation of adequate instru- 
ments to measure aptitude requires much tine and effort of highly 
qualified specialists. Teachers should not try to prepare their own, since 
carefully prepared tests to measure language aptitude are readily available. 

1. 1. 2 Measuring Achievement 

One of the most common functions that tests serve is to measure 
a student’s achievement . Achievement tests establish for the teacher 
a carefully selected sample of a student’s performance. 

1 . 1. 3 Tests as Diagnostic Instruments 

Tests constructed for diagnostic purposes are different from 

^John B. Carroll and Stanley M. Sapon, Modem Language Aptitude 
Test. Form A (New York: The Psychological Corporation, 1958, 



Pimsleur Language Aptitude Battery (New York: Harcourt, Brace & 

World, Inc. 7 1966) . 



2 



tests designed specifically to measure language aptitude. Diagnostic 
tests meet a continuing everyday need and are helpful to the teacher 
throughout the course of instruction. From the results of a test, a 
teacher can diagnose possible inadequacies in either his teaching or 
his teaching materials. Careful inspection of an individual student’s 
paper will also often make clear what specific difficulties he is 
having. The teacher can use such knowledge to improve his teaching 
and the student’s learning. 

1. 1. k Tests as Teaching Tools 

It can he argued that students can actually learn in the process 
of taking a test. Such is likely to he the case if the test challenges 
the student to learn by analogy and discovery and leads him to try 
to understand new notions as the test proceeds. Such tests not only 
test what the student has learned. They help the teacher to decide 
whether the student is learning how to learn language. Some items 
can be included in each test for just this purpose; they may be very help- 
ful for student and teacher alike in carrying on the foreign-language 
program. 

1. 1. 5 Collateral Problems and Benefits 

1. 1.5*1 The best tests serve several purposes simultaneously. Class- 
room tests can provide a basis for assigning grades, for comparing 
pupils with one another, for motivating learning, for directing learn- 
ing, and for helping to improve instruction in other ways. Classroom 
tests also serve students as a source of information about the objectives 




of the course and their own individual achievement* If a test 



faithfully represents the major objectives of a course, preparing for 
it and reviewing it after it has been administered will reinforce the 
rest of the work of the course. 

1* 1* 5* 2 Testing also often confirms the evaluation which a good 
teacher has already made of a student* If a test does not confirm the 
teacher 1 s opinion, several explanations are possible* The test itself 
may be badly constructed; a consistent pattern of low scores for good 
students or higfi scores for weak ones, for instance, usually means 
that the test itself is defective. The teacher’s previous appraisal 
may have been based on insufficient information. The item types chosen 
may not have been used often enough previously for the students to 
have become familiar with them; they may therefore have misunderstood 
what was expected of them and have given incorrect answers, even 
though they knew the correct answers. A teacher’s day-to-day evalua- 
tion of a student may be quite correct, but formal testing required 
primarily to give the carefully measured results necessary for per- 
manent records may not be enough to corroborate that appraisal. 

1*1. 5. 3 The questions used in classroom tests provide students with 
information about the aims of the course. They also direct the 
students’ efforts "towards the acquisition of the specific facts and 
skills needed to achieve those aims. When the teacher sets test 
specifications and devises test questions to help him to judge his 
students’ progress toward particular goals, he may realize that those 
goals have implications of which he was not aware. 



k 



1. 1. 6 Spaced Testing 

The principle of spaced learning is that in foi-eign-language 
teaching it is better to teach or drill a given item for two minutes 
on ten different occasions than for twenty minutes at one time. How 
is this principle related to testing? More frequent short tests 
seem generally to be more effective than infrequent long tests. There 
are a number of reasons why frequent short tests are useful. If 
students gradually become used to a variety of different types of 
items on short tests, longer, more complex examinations are more 
just and more reliable because the students are comfortable with their 
! formats. By using frequent examinations, the teacher may diagnose 
problems he can correct as he goes along. Frequent tests also insure 
that most students will study regularly and diligently, 

1.2 SPECIFIC TECHNIQUES 

1.2. 1 To determine whether students can make logical and well-chosen 
changes in the dialogues they memorize, a simple method is to put on 
the blackboard a series of phrases from which they are to select in 
order to vary the dialogue. Pairs of students are asked to make 
substitutions orally by selecting appropriate phrases from among 
those listed on the blackboard. Each partner is obliged to adjust to 
the accumulated variations and, in turn, to make substitutions that 
fit the rest of the evolving conversation. With this technique, the 
teacher not only tests the student 1 s ability to read aloud hut his 
ability to understand what is said to him. Such a test teaches at 
the same time as it tests. 



O 

ERIC 



5 



1. 2. 2 Impromptu Testing 

The format of an oral test need not be rigid. The seemingly 
unstructured nature of a test may be its best quality. For example, 
the teacher may simply move about the room quickly, asking as many 
short, quick questions as he can in ten minutes. These questions can 
be varied but must stay, of course, ‘within the student’s range of 
vocabulary and grammar. One problem which arises with this procedure 
is that there is not enough time to g7:*ade the student’s responses. 

This difficulty can be overcome with a pre-taped test which will be 
described below, (see 1.2. 3- 2). 

1. 2. 3 Directed Responses 

1. 2. 3. 1 Another type of loosely structured oral test derives from 
directed dialogue. The teacher instructs Student One to tell Student 
Two to ask Student Three whether he has done, or will do, something 
for him (Student One). It is important that all three students become 
engaged in the exchange so that they all have to modify answers and 
inflect verbs in various ways. Here, too, testing and teaching are 
not easily separated. 

1. 2. 3* 2 Both of the techniques described in 1.2.2 and 1.2. 3*1 have 
the disadvantage of being difficult to grade because they happen so 
quickly and because the teacher must therefore rely on his memory 
vhen he records his evaluation of each student’s performance. The 
following arrangement largely overcomes this disadvantage. Pre-record 
a large number of questions and statements that permit rejoinders. 

Leave a pause after each question or statement long enough for a 



O 

ERIC 



6 



student to answer, then record a random number ("one", "two”, etc. -- 
but in the foreign language) followed by another pause. Make as 
many such items as necessary to test adequately all of the students in 
the class. Assign numbers to the students especially for the test. 
Play the tape, and have each student answer whenever his number comes 
up; he must also write out his answer on the test sheet. The students 
will be alert throughout the test since the numbers are recorded at 
random. They will all foimulate answers to all of the questions, even 
though each individual student will actually have to answer only five 
or six questions. The teacher is free to move about with gra a e book 
in hand to make judgments and record grades. The student has 
enough time to write out his answers, because no number is ever re- 
corded twice in succession. The test papers can be collected and 
used by the teacher to decide on an additional grade for writing. 

1.2.4 Pattern Paragraph Practice 

Students are better able to handle test items the more they 
resemble in format the regular classroom presentation of the material 
Tjeing tested. Items of a type familiar to the student have much 
higher reliability than those that present material in a format the 
student has never encountered. Suppose a class has been practicing 
paragraph writing by replacing lexical items (nouns, verbs, etc. ) 
while leaving function words (prepositions, conjunctions, etc. ) in- 
tact, thereby adhering to the original grammatical structure of the 
paragraph but altering its content. For practice in writing, the 
instructions could be: "Copy the following short paragraph first as 



ERIC 

ijfflinaffamiaaa 



it is. Then rewrite it changing the underlined words to words of your 
own choice that make sense and are grammatically correct within the 
reconstructed whole . 11 For test purposes: "Rewrite the following 

paragraph, changing each underlined word to any appropriate word of 
your choice. " 

1. 3 MOTIVATION 

In addition to verifying assimilation of the material covered, 
good tests motivate students to strive to attain certain objectives. 
When he is taking a test, the student is at the peak of his powers of 
attention and concentration. Some would argue that the teacher should 
take advantage of this favorable situation and include some test 
items which are clearly designed to lead the student to see new 
relationships— which will lead him to learn something new. More 
generally, it is clear that much of a student’s best studying is done 
when he is preparing for a quiz or an examination. This heightened 
and sharpened effort can be focused directly on whatever the teacher 
chooses to emphasize. The teacher need only be sure that the student 
knows what to expect on the test* what material will be covered, and 
what testing techniques will be used. Testing is one of the strongest 
motivating forces available to the teacher; it is a force that can be 
focused clearly on specific objectives. Short, frequent tests con- 
centrating on one or two skills at a time will tend to sustain a 
relatively high level of concentration. 



8 



1, b GEWERAUZAHONS 

If the student does not know in advance exactly which uses of the 
subjunctive, for example, will appear on a test, he will tend to study 
the whole category. In a test of this type, however, the reliability 
of the test might be open to question if it is too short or if there 
are not enough items (approximately three) testing each specific point. 
(Pairs of items permit more reliable judgments than single items. Sets 
of three items are still more reliable than pairs*) For the most part 
it is safe to say that if a student can conjugate properly three 
regular verbs, he can conjugate any number of others, that if he can 
write two correct contrary- to -fact sentences, he can write another, 
etc. It also seems reasonable to conclude that if a student can write 
an acceptable paragraph, he could write an acceptable composition. The 
consistency of performance of a group of students also allows us to 
generalize about the rank order of the group: the best student in the 

class on one day is not likely to be the worst on the next. 

1. 5 LISTENING COMPREHENSION 

1. 5. 1 Comprehension practice and tests can be conducted efficiently 
by having the student listen to recorded material with which he is 
unfamiliar either in the language laboratory or at home. His task 
is to write out a given text, taking as much time as he needs to 
transcribe it all as correctly as he can. In the classroom, the 
teacher may have several students copy sections of the text onto 
the blackboard in order to have the entire script readily available 
for the whole class. The whole piece can then be carefully studied 



9 



for meaning, grammatical difficulties, and orthography. It can also 
he read aloud. 

1. 5. 2 A text to he used for a comprehension test may he a pre- 
recorded paragraph, dialogue, or narration with which the student is 
expected to deal in almost this same way. He is limited, of course, 
in the amount of time he can spend writing out what he hears. Such 
a test is essentially a dictation. (The text can also he read aloud 
by the teacher .) 

1, 6 A NOTE OF CAUTION 

j. 6.1 Teaching and testing are not so much alike that all testing 
techniques should he used or can he used for teaching. While most 
teaching techniques can he used for testing, the reverse is not true. 
Fbr instance, a good test should test everyone in a class. It must 
challenge and indeed go slightly beyond the competence of the best 
student in the class; otherwise, the limit of that student’s ability 
remains unknown. If we take this view, even one perfect paper means 
that the test has not really measured the whole class. But a single 
student does not invalidate the test as a good measure of the great 
majority of the students in the class. It is dangerous always to 
aim over the head of the best student; every student needs positive 
reinforcement continually, or he may lose interest and momentum. 

On the other hand, if a test item is so easy that every student answers 
it correctly, it cannot discriminate between good and poor students. 
Whenever the tester’s purpose is to make such a discrimination, items 



10 



o 

of this kind are invalid. Those same items, however, may measure very 
successfully the achievement of a class. Individual test items are 
valid or invalid as they measure or fail to measure what the examiner 
wants to evaluate. (See Chapter V.) In class we regularly provide 
challenges to which the entire class can respond successfully. Such 
drills teach and simultaneously permit the teacher to evaluate each 
student’s achievement. If they are carefully constructed, however,' 
and if the whole class has learned successfully the point at issue, 
each student will respond correctly; such techniques are useful for 
evaluating the students' performances, but they are not very useful 
for ranking them. This is an example of a teaching technique that 
can he used for testing for one purpose, hut not for another. 

1. 6.2 The student must know precisely what material a test will cover 
and what skills will he emphasized, hut at the same time he should 
understand that acquisition of a second language is necessarily a 
cumulative process and that grammar covered at any given point in a 
course naturally continues to he tested incidentally from then on. 

There should he frequent tests deliberately recombining a number of 
points . Some students who do well when the' class is working on a 
particular topic may he unable to produce' - those same structures 
spontaneously at a later time. ’When this occurs on a test, the test 
is useful for diagnostic purposes. The teacher should take informa- 
tion of this kind into account in planning his course. 

^Cf . validity: the extent to which a test measures what it 

is supposed to measure. 



O 

ERLC 



11 



1. 7 CONCLUSIONS 

(a) Tests can be used to measure (l) aptitude, and (2) achieve- 
ment j they can be used for (3) diagnostic and [k) teaching 
purposes. 

(b) A good test will bear out what the teacher already knows 
about most students' achievement. 

(c) When a test does not confirm a teacher's earlier judgment of 
a class or a student, it is especially useful as a diagnostic 
tool. 

(d) Tests can sometimes actually teach by imitation, by induction, 
or by analogy. 

(e) The principle of spaced learning applies equally well to 
testing. Frequent, short tests are more reliable, more 
productive, and fairer to the students than infrequent long 
ones. But both shorter and longer, more comprehensive tests 
should be used, 

(f) Many teaching techniques are also excellent for testing. 

(g) Short tests are reliable if they contain enough items on 
each point tested. 

(h) Comprehension practice is often neglected. 

(i) Tests can be a powerful means with which to motivate students 
to strive to reach specific objectives. 

(j) Grading oral production can be made easier by the use of a 
simple tape-recorded test. 



- 12 - 



Chapter II 

PLANNING THE CLASSROOM TEST 



2. 1 WHAT TO TEST 

g. 1. 1 A test is a sample of a student’s -work. Because it is gen- 
erally not possible to cover in a test everything that a student 
has been taught, a test contains a selection from among all of the 
questions that he might possibly have been asked on specific topics. 
This sample should enable the teacher to generalize about the student’s 
progress or achievement in the total r.rea from -which the sample is 
drawn. Therefore, before the teacher begins to compose a test, he 
should have clearly in mind the purpose the test is to serve, the 
specific language skills and content areas that are to be measured, 
and the emphasis he wants to put on various aspects of achievement. 

If these factors are not taken into account in preparing a classroom 
test, there is a serious risk that the test may provide a dangerously 
misleading picture of the students' progress. 

g«l.g If the teacher writes and assembles test items without a clear 
plan, he may overemphasize language skills and content areas in which 
items are easy to construct, simple to administer, and easy to grade, 
or in which he himself is especially interested even though they may 
not reflect accurately what has been taught in class. 

5» h 3 Ideally, in planning a test the teacher should specify the 
language skills and content areas to be measured and the relative 
emphasis to be assigned them in much the same way as do the publishers 
of standardized tests. Setting the specifications for a test need 



O 

ERIC 



-13- 



not, however, "be an especially elaborate process, although for 
major examinations it is more exacting than for simple quizzes. 
Sometimes all that is needed is a simple list of topics that must 
he included to provide a representative sample of the language skills 
and content areas to he covered in the test together with an indica- 
tion of their relative importance. 

2. 1. 4 The teacher must decide when to test, what kinds of items to 
use, how many of each to include, how the test should he given, who'*" 
should score it, whether or not the scores should he corrected for 
guessing, and what the student's grade will mean. 

2.2 WHEN TO TEST 

2.2. 1 All tests should serve a useful purpose. Tests should not 
he given just for the sake of testing; they should not he used as 
instruments of punishment; and they should not he given to free the 
teacher to attend to other tasks. 

However, the testing schedule is frequently regulated hy the 
school's fixed schedule for maifcing and reporting grades. Some 
teachers test only at the end of the marking period. Others give 
full-period tests every four, six, or eight weeks, or at the com- 
pletion of major units of study. Teachers who make use of tests 
for instructional purposes give short tests frequently; for instance, 
some use unannounced quizzes to motivate students to do their 
assignments carefully and on time. 

2.2.2 Frequent tests have several advantages. Both teacher and 




student are kept better informed about the student's progress; the 



-14- 



teacher has a larger number of observations on which to "base grades; 
the effectiveness of the teaching of specific topics is evaluated 
"before it is too late to correct misconceptions. However, tests 
can "be too frequent; the student can easily "be led to expend too 
much effort preparing for tests for the sake of grades. 

2.2. 3 The time within the school day when a test can "be given 

is usually predetermined "by the school schedule. In any case, there 
is very little evidence supporting preference for any particular 
day or hour for testing. It is only sensible, however, to avoid 
scheduling tests on the morning after an evening affair like a school 
dance or for the hour following an exciting activity like a football 
rally. 

2,2. k If possible, it is desirable to schedule end-of-year tests 
before the closing of school so that the teacher can discuss the 
test results with the students. Unfortunately, many final-examina- 
tion schedules do not allow time for students to find out what ques- 
tions they missed or why they missed them. 

2. 3 TYPES OF TEST ITEMS 

2 »3«1 The relative merits of essay questions and objective items 
have been much debated. Neither type of item is inherently superior 
to the other. The important questions are (l) whether a type of 
item serves the purpose of the test, and (2) how well an item is 



constructed. 



- 15 - 



2 . 3*2 Essay Questions 

The essay question requires the student to develop answers from 
hip own knowledge without benefit of suggested possibilities, and to 
express the answer in his own words. 'When a test is concerned with 
cultural information, the consent of literary works, or literary 
analysis, essay questions are clearly appropriate. Essay questions 
can be prepared easily. They can often simply be written on the black- 
board. Essay questions very largely eliminate guessing. Essay 
questions can serve to measure complex abilities when students are 
. required to describe, explain, compare, contrast, analyze, criticize, 
interpret, or generalize. Unfortunately, however, many essay questions 
do not measure such abilities. All too often students are asked 
merely to state, in continuous prose, facts or information that they 
are supposed to have learned. (At early and intermediate stages of 
language study, it is wholly appropriate, of course, to ask the 
student simply to state facts or opinion in straightforward prose. 

More complex tasks should be reserved for advanced courses. ) Essay 
questions may be stated so ambiguously or in such general terms that 
it is difficult to decide whether the student has really treated the 
topic. 

2. 3» 3 The principal disadvantage of the essay question is the un- 
reliability of the scoring of the answer. One reason why it is 
difficult to achieve reliability in scoring answers to essay ques- 
tions is that teachers differ greatly in their judgments. Such 
differences of opinion are usually due to the simple fact that 
teachers can easily judge a paper very differently if their criteria 




-16- 



are very different. The teacher's judgment may even he influenced 
hy how a paper looks — the easier the paper is to read, the higher 
the grade assigned. 1 * The teacher's judgment may be influenced by the 
"halo" effect: he may tend to mark a student according to pre- 

conceived notions of the student's ability or of the general ability 
of the class as a whole. The "halo" effect may also operate from 
question to question: the quality of the answer to the first question 

may affect the scoring of the answers to subsequent questions. Finally, 
teachers frequently correct tests after a long school day, even late 
in the evening, and their scoring may therefore be directly affected 
by real fatigue. 

2. 3 « 4 There are two other limitations to essay questions. First, 
there is the possibility of inadequacy of sampling when only a few 
items are included in a test to cover a large body of material. 

Second, each essay question on a test represents a large part of 
the total score. Therefore, the effect on the total score of a 
single unsatisfactory essay is necessarily much greater than that of 
one or a few short items in a large number. 

The student is frequently allowed to choose one of a number of 
topics on which to write. If students are to be compared with one 
another, a choice among essay questions should not he permitted, 
because, unless the questions are equated for difficulty, it is 
difficult to judge how well the students who chose to answer ques- 
tions 1 , 2 , and 4 , for example, would have done had they chosen to 
answer questions 3, 5, and 6 instead. It also sometimes happens that 
the better students attempt the more challenging questions and write 




-17- 



less acceptable answers than do the less able students who choose to 
answer the easier questions. 

2.3*5 Objective Questions 

Objective items have at least four advantages: 

1. Since they permit the teacher to sample a greater variety 
of areas in a relatively short time, the student’s 
strengths and deficiencies are more likely to reveal 
themselves. 

2. They can be reliably scored; if the items are unambiguous 
and the test has been properly keyed, scoring errors will 
be clerical errors rather than errors of judgment. 

3. They are more easily scored and, therefore, scoring time 
is reduced. (They can often be scored by the students 
themselves. ) 

4*. They lend themselves rather readily to item analysis.' 1 * 
(Consequently, over a period of time, the teacher can 
assemble a file of items by keeping from each test the 
questions that actually discriminate well. ) 

2. 3. 6 The greatest limitation of objective questions is that they 
are difficult to construct if they are to test more than sheer memory. 
Objective questions, especially multiple-choice items, can test most 
of the higher intellectual abilities and skills that can be measured 
by essay questions. But, because the construction of objective items 
that do measure more than memory requires much time and considerable 

^For an explanation of item analysis, see Chapter V. 



O 

ERIC 

hfliflaffHMioaa 



-18- 



ingenuity, teachers are likely to he content with questions that 
test only knowledge of facts — sometimes veiy trivial facts. 

2. 3* 7 A second limitation of objective questions is that they lend 
themselves to guessing on the part of the student. Guessing can he 
controlled hy correcting for guessing, hut correction for guessing 
is difficult for teachers to apply and there is ultimately no way 
to differentiate between answers that are wrong because the student 
has merely guessed and those that are wrong because the student has 
answered in good faith but on the basis of misinformation. 

2. 3*8 Four types of objective questions are especially common in 
foreign- language teaching: completion, true/false, matching, and 

multiple- choice. All of these types can be useful if the items are 
well composed. They are all easy to use in classroom tests and a 
large number of them can be administered in a relatively short time. 
2. 3*9 There is no reason why a variety of item types cannot be 
used in the same classroom test; the decision must depend on the 
purpose of the test. When more than one item type is used, however, 
items of the same type should be grouped so that the student is not 
confronted with frequent changes of directions, 

2. k LENGTH OF TESTS 

2. 4.1 How long a test should take depends on the purpose that the 
test is to serve. Except for special examinations, most longer 
classroom tests are limited by the length of the regular class 
period, (in general, however, the longer the test, the more reliable 
the scores. ) 




-19- 



1 

f 

lb Kit 



2. 4. 2 The classroom test should he deliberately planned so that most 
students have time to attempt all questions as they work at their 
normal speed. Anxiety is likely to he accentuated when there is 
pressure to work fast. 

2. 4. 3 The number of objective questions that can be answered in any 
given length of time dependB upon the questions and the work habits 
of the individual students; the fastest student in the class may 
finish a test in half the time that it takes the slowest student to 
finish it. For multiple-choice and completion items, many teachers 
find that most students can complete about 50 questions in 1+0 minutes; 
true/false and matching questions can often be answered at the rate 
of two per minute. It must be emphasized, however, that each 
objective item type is very flexible; much depends on the use the 
teacher makes of it. One teacher may regularly write short items of 
one type that students can answer quickly while another may use the 
same type but write items that take much longer to answer. The length 
of time required for essay questions depends in large measure on 
the complexity of the questions. The teacher can write out an 
acceptable answer to each essay question and multiply his time by at 
least three to estimate the time his students will need. If in doubt, 
the teacher should take the whole test himself, reading every question 
through completely and conscientiously writing out appropriate answers; 
he will then be better able to estimate the amount of working time 
that will be required by the students. 

The student's experience of tests increases in proportion to 
the length of time he has spent in school. The more experience of 



- 20 - 



this kind he has had, the faster he can work. The elementary - 
school child cannot he expected to take tests as long as those 
administered at the junior- high- school level; tests administered to 
junior-high- school students should he shorter than those administered 
to students in the senior high school, etc. 

2. 5 RELATIVE DIFFICULTY OF TESTS 

2. 5»1 The level of difficulty of a test should also he largely 
determined hy the purpose that the test is to serve. An achievement 
test may he relatively easy; a test that is to he used primarily for 
grading purposes should he moderately difficult; a test that is to 
he used to isolate the most able students should he difficult. 

2. 5.2 In composing a test of minimum essentials, the teacher need 
not deliberately strive for a great range of difficulty; rather, he 
may use a large number of questions that he hopes vill he answered 
correctly hy the great majority of his students so that he can he 
sure that they have mastered those essentials and so that he can 
identify the few students who have not, for whom he must plan special 
remedial woric. In general, such tests are not very effective in 
discriminating among the various, levels of achievement from highest 
to lowest. 

2. 5. 3 If> on the other hand, a test is to he used to identify 
students who may he eligible to compete for special awards or to 
enroll in accelerated classes, etc., the teacher will need to use 
more difficult questions that can he answered correctly hy com- 
paratively few students. 




- 21 - 



2. 5. h On end-of-unit tests, mid-term tests, or final examinations, 
it is important to "be able to rank students in the order of their 
level of achievement. For tests of this type, many teachers like 
to use a few very easy items to encourage the poorer students, a 
few very difficult items to challenge the "better students, and many 
items of moderate difficulty. It must "be clearly understood, however, 
that in the last analysis it is desirable for all students to do well 
on all tests. If they do, there will be few C’s, D’s and F’s, but 
the students will know the material they have studied. The teacher 
must beware lest he be carried away by considerations of grading. 

He must not deliberately go looking for students whom he can classify 
as poor . 1 Research has shown that students react so differently to 
Individual items that it is not necessary to vary the difficulty of 
items deliberately. In most cases, the more reliable and more valid 
tests are those in which all items are close to the 50 percent level 
of difficulty: i.e., each item is answered correctly by about 50 

percent of the students but the variability of the students them- 
selves in' their responses to individual items is so great that a 
wide range of scores results. 



2.6 FORMAT 

2. 6. 1 Certain aspects of the format of a test need to be deter mi ned 
when the test is planned. The teacher must first decide whether the 
test will he presented orally or in written form. Stimulus material 
for tests of listening comprehension is presented orally, hut it is 
best to present the responses in written form if they are of the 



- 22 - 



multiple- choice type and each item has more than three choices* 
Speaking tests may have "both oral and written stimuli, hut the 
responses will he given orally, of course* Objective tests of 
reading and writing must obviously he presented in written form. 

2. 6.2 Since legibility, attractiveness, and economy of space are 
important, tests should he typed rather than hand-written. Responses 
for multiple-choice items should he listed vertically rather than 
horizontally whenever possible. If the answers are to he recorded 
directly on the pages of the test, the spaces provided for the 
answers to objective questions should he placed conveniently both 
for the student and for the teacher when he comes to score the 
paper. If a simple answer sheet can he designed for use with a test, 
the time required for scoring will be greatly reduced. It is 
important that items he grouped by types to keep to a minimum the 
number of different sets of directions. Items should also he grouped 
by content within each item-type, and arranged from easy to difficult 
within the test as a whole as well as within each major subdivision 
of the test; such an arrangement of items makes it easier for the 
teacher to analyze student performance for diagnostic purposes. 

2. 6. 3 For essay tests, wherever different amounts of credit are 

allowed for different questions or for different parts of a single 
question, the credit to he allowed should he decided when the test 
is planned and should he clearly indicated at appropriate places on 
the test. The question is frequently asked: If objective questions 

are specifically designed to have different degrees of difficulty, 




should not , correspondingly different amounts of credit he given for 



- 23 - 



them? For practical purposes, even though such differential 
weighting of objective questions might he desirable, assigning 
appropriately different amounts of credit is a complex process 
and the process of scoring such a test is complicated and time- 
consuming* 

g, 6* h Finally, the teacher may use a variety of media as bases for 
test items: pictorial material shown with an opaque projector or 

an overhead projector, films, filmstrips, models, demonstration 
performances, etc* The availability of materials of these kinds 
can influence the choice of item types and the way in which they are 
presented to the students. 

2* 7 SCORING 

2,7,1 Eveiy test should be planned so that it can be scored as 
reliably as possible. For objective items scoring is easy if the 
items are all well constructed and if the key is correct. For 
essay tests scoring is more complicated; the task is simplified, 
however, if the questions themselves limit the student f s freedom 
of response, if the teacher has a clear notion of the responses he 
will allow, and has decided exactly how much credit to give for each 

t 

separate part of the test. 

I 

While most classroom tests are scored by the teacher, 
some tests (especially shorter quizzes or tests of objective type) 
can be scored by the students if they are closely supervised. 

For objective tests, scoring is straightforward: answers are 
either right or wrong. For essay tests, in which subtle 




-24- 



judgments are involved, the teacher necessarily hears the 'whole 
burden of responsibility. 

2. 7« 2 Because essay tests are difficult to score reliably, some 
people consider it desirable to have more than one teacher judge a 
student f s 'work. Arrangements can sometimes be made to have each 
student’s test scored by two different teachers; the two scores can 
then be averaged. There is no guarantee, however, that this pro- 
cedure will really reduce the -ultimate subjectivity of the grade; 
it does make it more probable that the final score is reliable, 

When all teachers working at the same grade level collectively 
develop and give the same tests at the same time, it is possible to 
have each item or set of items scored by a different teacher; the 
final grade for each student then depends on the judgments of a 
number of teachers rather than on the judgment of a single teacher, 
2, 7* 3 For the teacher who relies on his own judgment alone in grad- 
ing essay questions, the following suggestions may help him achieve 
more reliable scores: 

1, Before scoring the first paper, have a clear notion of 
the answers you will allow for each question and of the 
weight you will give to each of the various elements in 
the answer. 

2. Score all answers to a particular question before going 
on to the next question. 

3* Grade without knowing the identity of the student when- 
e\er possible. 

4. Use categories rather than percentage grades (i.e,, use 



O 

ERIC 

hfliflaffHEraaaa 



A, B, C, D, F, or 5, 4, 3, 2, 1 or Excellent, Good, 

Fair, Poor, Failing, etc. , — rather than 87 , 76, 64, 
and the like). 

2,7, 4 The teacher can tell about the reliability of his own scoring 
if he grades a set of essays without recording his grades on the 
papers themselves and then rescores them after an interval of time 
and compares the two sets of results, (For this purpose, it is 
especially important to grade each paper without knowing whose it is. 
2. 7 . 5 These suggestions may make the scoring of essay questions 
look simple. It is not. The scoring of essay questions in foreign- 
language tests is very complicated. Among other things, the teacher 
must decide how much importance to give to language and how much to 
content. He must also decide whether he should grade language and 
content separately. If he does grade them separately^ how shall he 
combine the two grades to arrive at a single grade? 

2.8 CORRECTION FOR GUESSING 

2. 8. 1 Correction for guessing is intended to reduce to zero the 
chance score^ of the student taking a test on which he makes a 
choice among responses. 3 a correction for guessing is applied in 



^Chance score: the score that can be obtained by guessing alone. 

w 

3>rhe formula for the correction for guessing is R - = Score, 

where R represents the n amber of right responses, W represents 
the number of wrong responses, and n-1 represents 1 less than 
the number of choices. Thus, the formula for scoring a five- 
choice multiple-choice question is R - W/4 = Score, and the 
formula for scoring a true/false test is R - W/l or R - W = Score 



-26- 



the scoring of many standardized tests; however, it is not really 
necessary to correct for guessing for most classroom tests. If 
every student tries to answer all the questions on a classroom test 
the rank order of scores will not he changed by correction for 
guessing. 

2. 8.2 Some would argue, nevertheless, that occasional correction 
for guessing on objective classroom tests at least acquaints students 
with the effects that such correction may have on the score; the 
experience may be useful to them when they come to take standardized 
tests that are corrected for guessing. 

2. 9 DETERMINING A PASSING SCORE 

2.9.1 We know that we cannot always write tests of the same diffi- 
culty. We know that we cannot estimate the difficulty of the 
questions in a test precisely enough to define a passing grade (or 
any other cut-off score) before the test has been administered. 
Therefore, the scores on any test must be scrutinized care- 
fully and adjusted in terms of the performance of the whole group 
before they are interpreted or reported. Teachers must not be 
afraid to adjust test scores; if the scores are unexpected, the 
fault may lie with the teacher or the test rather than with the 
students. 

2.9.2 It is the teacher who is responsible for measuring student 
achievement day by day and week by week. That measurement must 

be as reliable and as valid as he can make it. If a test is care- 



O 

ERIC 

ijfflinaffamiaaa 



- 27 - 



fully planned, the task of writing the items and assembling them 
is very explicit, and reliable and valid measures do result. 



O 

ERLC 



-28- 



Chapter III 

THE CONSTRUCTION OF TEST ITEMS 
3*1 ESSAY QUESTIONS 

3* 1* 1 In composing essay questions, the teacher should be especially 
careful (l) to state the question itself unambiguously, and (2) to 
specify clearly within what limits the topic is to be treated, so 
that the student’s efforts are clearly focused on a concrete task, 
and so that different students’ essays on a given topic can be com- 
pared with one another. 

3* 1*2 Among the abilities which can be tested effectively by using 
carefully written essay questions are the following: 

(1) to compare and contrast (people, events, objects, etc.); 

(2) to develop and defend an opinion; 

(3) to discern and explain a cause or an effect; 

(4) to summarize; 

(5) to analyze complex phenomena into their component parts; 

(6) to give examples to illustrate rules, principles; 

(7) to criticize the adequacy, relevance, accuracy, of a 

given statement, assertion, opinion; 

(8) to reason inductively or deductively. 

3* 1* 3 The beginning or intermediate student can be asked to write 
more or less lengthy and complex pieces of prose according to his 
ability; he can be asked, for example, simply to state clearly and 
straightforwardly certain information he is presumed to possess, or 



ERIC 

ijfflinaffamiaaa 



-29- 



to describe accurately a person, object or place with which he is 
familiar, etc. 

3.2 OBJECTIVE ITEMS 

3.2.1 Multiple-Choice Items 

3.2. 1. 1 A multiple-choice item may take several forms: (l) it 

may be a direct question followed by several possible answers; 

( 2 ) it may be an incomplete sentence (called the stem ) followed by 
several possible completions; or ( 3 ) it may be a problem, a graph, 
a diagram, etc. , followed by several statements. The student may 
be asked to select the one choice that is correct, the one that is 
incorrect, or the one that is best . These three kinds of answers 
combined with the three item forms give the test constructor nine 
possibilities. Beginners are usually more successful at writing 
questions than at composing items consisting of stem and completions. 

Multiple -choice items can be used to measure the student's 
ability to discriminate, interpret, analyze, make inferences, and 
solve problems. It is sometimes argued that multiple- choice items 
are inherently weak in that students are required merely to recognize 
and judge proposed solutions, interpretations, or distinctions; 
such items are weaker, it is contended, than others in which the 
student is required first to recall possible answers and then to 
select the correct one. Others argue that the kinds of problems 
posed by multiple-choice items come much nearer to real life than 
do those of any other type of test question. They contend that . 



are rarely called upon to provide all possible solutions to a problem 



-30- 



and then choose among them; rather, several alternative possible 
solutions are frequently apparent and the essential problem is to 
identify the best or correct solution* Studies have shorn that the 
ability to recall correct solutions correlates highly with the 
ability to recognize correct solutions; it seems justifiable, there- 
fore to continue to make widespread use of carefully constructed 
multiple- choice items. 

Critics of multiple- choice items often base their criticism 
on the assumption that such items can test only facts or definitions. 
In practice, such criticism is too often justified: the teacher who 

uses multiple-choice test items frequently finds it easier to write 
only items of this type. It takes more time, imagination, and 
ingenuity to develop test items that measure a student's ability to 
interpret, to draw inferences, to aprly knowledge, or to think 
critically. Multiple- choice items can be written, however, to test 
these abilities with great subtlety. Carefully written multiple- 
choice items provide high validity and reliability and they are 
easy to score. 

3.2.2 Construction of Multiple -Choice Ite ms 

3.2. 2.1 Three general difficulties are encountered in the con- 
struction of multiple-choice items: (l) the language of the item 

must be clear and unambiguous; (2) the answer must be unquestionably 
correct; and ( 3 ) the distractors must be attractive to those students 
who do not possess the knowledge or understanding necessary to re- 



td 

ERIC 

ijfflinaffamiaaa 



cognize the correct answer. 



t 

I 

I 

I 

I 

1 

1 

l 

I 

7 

\ 

X 

f 

t 

1- 

i 

•i 

1 

I 

1 

K 

9 

I 



I 

o 

ERLC 



-31- 



3*2, 2. 2 Here is a list of specific points that the teacher should 
hear in mind as he writes multiple-choice items: 

(1) The directions should tell the student explicitly whether 
more than one answer is possible and whether he is to select the 
correct answer or the best answer, 

(2) The stem or question should be worded simply and under- 
standably. Use words appropriate to the subject matter and to the 
group being tested. Be as succinct as possible. The solution to 
the question should not depend upon the student* s ability to under- 
stand unnecessarily complicated language. 

( 3 ) Each item should pose one question; do not test two or 
more points in a single item. 

(k) Each item should be independent of all other items in 
the same test. Students should not be able to select the correct 
answer to one question because of information gleaned from another 
item, and they should not be penalized several times for missing 
one item: if a student misses one item he will also miss all others 

that depend on it. 

( 5 ) The choices should be as short as possible. Words re- 
peated in each of the alternatives can usually be added to the 
stem; a question can often be reworded to simplify the responses. 

(6) The best distractors are based on common mistakes made 
by the students. Excellent distractors can often be derived from 
wrong answers to essay questions and completion items. 

( 7 ) Use four or five choices whenever possible. This re- 
duces the chance that the student will guess the correct answer. 



-32- 



Use the same format throughout a test, 

(8) All distractors should seem equally plausible to the stu- 

dent who does not know the correct answer or cannot arrive at it 
logically,, (if, for example, two of four alternatives are obviously 
wrong, the question becomes a true/false item: one of the remaining 

alternatives is the correct answer, ) 

(9) It is best to use random order for the positions of the 
correct answers* Students are quick to perceive patterns or apparent 
patterns. 

(10) Do not make the correct response consistently longer or 
shorter than the distractors. (it often happens that the teacher 
makes the correct answer more detailed than the others, ) 

(11) If plausible distractors are difficult to find, use 
another type of item, 

(12) In using stems that are incomplete sentences, write the 
item so that the missing part of the sentence is to be added at the 
end rather than at the beginning or within the sentence, 

3.2,3 True/False Items 

Many teachers believe that true/false items are the best type 
of item to measure easily and accurately the student’s knowledge of 
specific facts, 

3*2.4 Construction of True/False Items 

3,2. 4, 1 Here are some important considerations that the teacher 
should bear in mind when he prepares true/false items: 




-33- 



(1) If the answers are to be indicated on the test paper 
itself, have the student ci rcle the proper response. (A common 
practice is to require the student to write T or F, t or f, / or -, 
or / or 0, for true and false, respectively. It is often very 
difficult to distinguish between such marks, especially when era- 
sures are permitted. ) 

(2) State each item clearly and specifically. 

(3) A true/false item should deal with a single definite 
topic. Whether the item is to be judged true or false should de- 
pend upon an important aspect of that question. 

(k) In a two-part item, the crucial element should come at 
the end. The first part should set the problem. (For example, in 
an item designed to evaluate students f understanding of the effect 
of a given cause, put the effect first.) 

( 5 ) Use a pp r oximat e ly equal numbers of true items and false 

items. 

(6) Avoid words that give irrelevant clues to the answer. 

Such words (called specific determiners ) enable the student to answer 
correctly without possessing the specific knowledge in question. 

(Studies have shown that in the great majority of cases, sentences 

*<• 

containing such words as always, no, never , all , none, etc., are 
most often false, while statements with such moderate words as 
some, may, often, generally, etc., are most often true. If you are 
careful to balance the number of true and false statements contain- 
ing terms of these kinds, the "determining" effect will be greatly 
reduced. Wherever possible, it is best to avoid such words altogether.) 



-34- 



( 7 ) Use quantitative rather than qualitative language wherever 
possible. (Terms such as large , many, important, better , etc., should 
be avoided. Such expressions are ambiguous and have very different 
values for different individuals. ) 

(8) In typing up the final copy of a test double-space true/ 
false items. (If true/false questions are written too close together 
they can be hard to read and confusing to answer. ) 

(9) Avoid compound statements consisting of two or more 
essentially independent parts. (if one part is true and another 
false, the statement is neither wholly true nor wholly false. The 
student cannot mark such an item true or false unless he is given 
specific directions for marking items which are only partially 
tree. Often in such items each part really deals with a separate 
problem. In such cases the question should be broken up into two 
items, each dealing with a single fact or idea. ) 

(10) Avoid double negatives. (Students who know the infor- 
mation involved in a question may be confused unnecessarily by 
double negatives. ) 

(11) Do not use word-for-word statements extracted from 
textbooks, syllabi, lecture notes, etc. (Often, when such state- 
ments are taken out of context, they are ambiguous. A tree/false 
item that presents a significant fact or a generalization in a new 
context is less likely to test mere recall. ) 

3.2.5 Modified True/False Items 

A. The effect of guessing on true/false items can be reduced 




- 35 - 



by increasing the number of possible answers to the question. 

Instead of asking the student to choose between true or false, 
he can be asked to choose among three possible answers: true , 

uncertain, or false ; correct, partially correct, or incorrect ; 
agree , undecided, or disagree ; etc. . Such a device reduces to one 
in three the chance that the student will guess the correct answer 
and it provides more information than the standard true/false form. 

B. The corrected true/false item is a modification designed 
to reauce guessing and to direct the attention of the student to 
the crucial element in the statement. The crucial element of the 
statement is underlined. The student is directed to pay attention 
to this key word or phrase and to use it as the basis on which to 
decide whether the statement is true or false. If he decides that 
the statement is false, he must then correct it by substituting an 
appropriate word or phrase for the underlined crucial element so 
that the statement is true . This form implies two possible scores 
for each item: one point for a correct answer without appropriate 
correction and two points for a correct £r>3wer with appropriate 
correction. Since items of this type take more time than simple 
true/false items, fewer of them can be used in a given time, 

C. A third modification also reduces guessing and increases 
information, but at the expense of increased difficulty in grading: 
the standard true/false form is used, but the student is also re- 
quired to state, in a few short sentences, why the statement is true 
or false. Scoring can be adjusted to cover all possibilities. 



-36- 



3*2*6 Matching Sets . 

3*2* 6* 1 Matching sets provide a convenient way to measure knowledge 
of series of facts, principles, relationships, or interpretations. 
Matching sets usually consist of material arranged in two columns: 
the items in the first column provide stimuli and the items in the 
second column serve as responses. The students task is to select 
the one response which is most closely related to each stimulus. 

3* 2* 6* 2 Matching sets may he thought of as condensed series of 
multiple-choice items. A matching set with six pairs of related 
terns can he used to measure the same thing as six separate hut 
related multiple-choice items. These six items would require many 
more words and more time for the student to read and answer the 
questions. 

3*2. 6* 3 The principal advantage of matching sets, therefore, is 
their ability to measure quickly the student 1 s ability to discriminate 
among several similar items as they are related in a given way. 
Matching sets are relatively easy to write, hut they are often very 
poorly conceived. Nevertheless, ease of construction is also one 
of the major advantages of matching sets. Other advantages are 
their applicability to many different kinds of subject matter, their '* 
comparative freedom from guessing effect, and theii minimal depend- 
ence on the student’s reading ability or reading speed. 

3 *2* 6* k Major disadvantages of matching sets are that it is easy 
to fall into the habit of composing items of this type that over- 
emphasize mere recall, and that it is easy to give inadvertent clues 



-37- 



to the correct answers. 

3* 2 . 7 Construction of Matching Sets 

Many of the precautions mentioned in the discussion of the 
construction of multiple- choice and true/false test items are 
equally applicable here (use simple, clear statements, be grammati- 
cally consistent, define the problem clearly, avoid giving clues, 
etc. ) . The following additional considerations should be borne in 
mind in writing matching sets: 

(1) The stimulus column should be on the left, the response 
column on the right. Each of the terns in the left-hand column 
should have a number (each of them is a separate item). The items 

in the right-hand column should be lettered. When no separate answer 
sheet is used, have the student place his answer to each item in a 
space to the left of the number of the item. The use of a number 
for each stimulus and a letter for each response simplifies direc- 
tions. 

( 2 ) it is important that the items listed in each of the 
columns he homogeneous. If the terms listed in the response column 
are not homogeneous, the student may he provided with clues which 
will hellp him tc match the terms in the two columns. The items are 
then proportionately easier. The selection of the correct tern 
should depend on the student’s knowledge of the relationship being 
tested rather than upon his ability simply to eliminate incorrect 
answers. (If the items are heterogeneous, there is also the possi- 



bility that the student may discern a relationship other than the 



-38 



one the examiner intends; he can then argue that his answers are 
also correct.) 

(3) In writing matching sets in which the student is to match 
single words with long definitions, names of persons with quotations, 
or other single words or short phrases with long phrases, put the 
long phrases (definitions, quotations, etc.) in the stimulus column 
and the single words or short phrases (names, dates, etc. ) in the 
response column. This arrangement reduces the amount of time re- 
quired to answer the question; the student rereads the shorter rather 
than the longer list of elements to make his choice. 

(in matching sets involving terms and definitions, the student 
is usually given the definition and he must select from among a 
number of possibilities the tern it defines. Many teachers prefer 
to give the student the term and ask him to choose a definition; for 
this purpose the order suggested above must obviously be reversed: 
the short element~the term- -appears in the left-hand column and the 
longer elements— the definitions--in the right-hand column. ) 

(4) To reduce the effect of guessing, the response column 
should contain a few more elements than the stimulus column. The 
extra elements should be homogeneous with the other response ele- 
ments, if they are to serve as effective distractors. 

( 5 ) Experience has shown that the stimulus column should 
not contain more than ten elements. In instances where many more 
homogeneous stimulus and response items are possible and desirable 
on a test, it is best to divide them into a number of matching sets. 
By decreasing the reading time this reduces the time needed to locate 




-39- 



the correct answer. 

(6) To enable the student to work efficiently, the items in 
the response column should he placed in some logical order. For 
example, dates should he listed chronologically; names or places 
should he listed alphabetically, etc. A logical order enables the 
student to scan the list quickly to see whether the name cr tern 
he has in mind is among those listed. A random order of elements 
merely increases reading difficulty unnecessarily. 

( 7 ) All parts of a matching set— the directions and the 
stimulus and response columns — should he on the same page. The 
student should not have to turn the page to match stimulus items 
with response items. This increases the difficulty of taking the 
test and may actually decrease the efficiency of the test item for 
measuring purposes. 

3.2. 8 Because it is difficult to write matching sets to measure 
more complex mental abilities, most matching sets in practice test 
knowledge of correlations between events and dates, events and places, 
individuals and events, individuals and Quotations, or other specific 
facts. More imaginative uses of matching sets require the student to 
match causes and effects, principles and applications, situations and 
judgments, or problems and solutions. The student-can also ‘be asked 
to match places or events and locations on a map, descriptions or 

t 

names and parts of a piece of equipment shown in a picture, etc. 




-40- 



Chapter IV- -FRENCH 
PREPARING TEST ITEMS 



4.1 INTRODUCTION 

4. 1.1 It is not easy to amass a large repertoire of effective test 
items. In addition to the difficult task of writing understandable 
instructions and credible dis tractors, the teacher must worry con- 
stantly about how the test fits what he has taught. In foreign- 
language work, it is often desirable for a test to sample all the 
language skills; testing devices must be used which are appropriate 
to each skill. We hope that the teacher will find the following 
sections helpful in making his own test items. This manual claims 
neither novelty nor completeness. We have tried, first, to be 
explicit about the essential elements of successful foreign-language 
tests and then to organize them into a useful working outline accom- 
panied by concrete examples. 

4.2 THE ESSENTIAL CT.EMENTS OF A TEST 

4.2.1 There are three basic sets of elements in any language test: 
(l) the language skills, (2) specific testing devices, and (3) the 
particular language areas to be tested. 

4.2.2 The Language Skills 

4. 2.2. 1 Teachers generally recognize four fundamental skills: 

(l) listening comprehension, (2) speaking, (3) reading, and (4) 
writing. In audio-lingual programs, listening comprehension is con- 




-41- 



sidered to have primacy over the others. The position taken is 
that one cannot speak, read or write meaningfully in a language 
without being able to understand that language. The other three- 
skills are also interrelated, but it is not clear that speaking 
is as important for reading and writing as listening comprehension 
is for all of them. Even in the native language, the reading skill 
is often— perhaps always — more highly developed than the ability to 
write. 

4, 2.2.2 Tests must take into consideration which specific skills 
are to be evaluated. When the reading skill has been the main 
concern of the course, for instance, tests should not emphasize 
listening comprehension; in audio-lingual programs, which generally 
begin with listening comprehension and speaking, reading and writing 
should not be tested in the early stages of instruction. 

4.2.2. 3 Of the four skills, listening comprehension and reading 
are easiest to evaluate objectively. The difficulties inherent in 
testing speaking and writing are readily apparent; with listening 
and reading it is a comparatively simple matter to, control the stu- 
dent’s responses, but it is difficult to do so when the student is 
to react orally or in writing. Judgment of spoken and written 
responses involves much personal interpretation by the teacher^ The 
burden of grading and administering oral- response tests makes them 
unwieldy; although written answers are somewhat easier to grade. 



they are just as difficult to control as spoken responses. Having 
a number of teachers grade written responses may actually increase 



-42- 



t 

the degree of subjectivity, although some experts disagree. ^ j 

4,2.3 Testing Devices 

4,2, 3* 1 The most commonly used devices in foreign-language testing 
are: (l) multiple-choice items, (2) completion items, (3) matching 

sets, (4) true/false items, (5) translation, (6) dictation, and items 
involving (7) expansion, ( 8 ) transformation, and (9) substitution. 

The last three are especially useful to test speaking and writing. 

Of the other devices (see Chapter III), true/false items and trans - 
lation are the ones most often debated. It is veiy difficult to 
phrase a true/false item so that it is clearly and unquestionably 
true or false but not transparently so. Translation has several 
shortcomings. Using translation from the native language into the. 
foreign language to discover the student’s control of specific 
problem areas (the sub junctive, for example, in French) is ineffective 
in cases where the student can find an alternate possibility (the 
infinitive, for example). Although translation from the foreign 
language to the native language is easier to control, its effective- 
ness is also somewhat limited because the student can usually find 
clues to the meaning of the phrase or word to be translated in language 



See, for example, The Measurement of Writing Ability by Fred 
Godshalk, Frances Swineford and William Coffman, College Entrance 
Examination Board, New York (1966). ’The articulation of norms 
for written composition differs from language to language. For 
an illuminating discussion of procedures, see How the French Boy 
Lear ns to Write by Rollo Walter Brown, Harvard University Press 
(19157- reprinted by the National Council of Teachers of English 
(1963)'. 



O 

ERIC 

ijfflinaffamiaaa 



-43- 



features (context, -word order, etc* ) other than the specific feature 
being tested. Therefore, unless the teacher can control quite 
strictly the context of a particular -word or phrase, it is difficult 
to determine the student 1 s actual command of the matter being tested. 
Extensive use of translation is obviously subject to attack in any 
curriculum that consciously emphasizes direct use and knowledge of 
the foreign language with as little .recourse as possible to the 
student 1 s native language. Perhaps translation is more accurately 
a specialized skill than a testing device. There is certainly no 
proof that the ability to understand, speak, read, and write a 
foreign language also guarantees the ability to translate freely 
between that language and one f s own language. Indeed, in institutions 
with a sound foreign-language program designed for specialists, 
translation is often taught in a separate course to which students 
are admitted only after they have demonstrated a command of the other 
skills. 

4.2. 3*2 Multiple-choice items (i.e., choice of one correct answer 
from two or more possible answers) and matching sets are easy to 
control since variation in answers is strictly limited. To be 
effective, multiple- choice items must have distractors which are 
sufficiently similar to the correct choice to attract the student 
who does not really know what the correct answer is. Incorrect 
answers can reveal the individual student 1 s problems. For example, 
a wrong choice, on an item testing listening comprehension, between 
le ton and le temps, l t amant and la main may be symptomatic of a 



-44- 



general inability to distinguish nasar^OwelH^ s from one 
another. By increasing the number of possible answers, the teacher * — — 
can gain more information about the individual student's weaknesses and 
needs. 

4.2. 3.4 There are two ways of constructing matching sets ? (l) the 
same number of items may be supplied in each of two columns, or (2) 
more items may appear in the column from which the •answers* are to 
be selected, thereby leaving a residue when the answers have been- 
chosen. In either case, matching is a type of interlocking multiple 
choice. Guessing, however, can be a factor in both matching and 
multiple- choice items. A formula for penalizing random choices can 
be used; it should be explained to the students in an effort to dis- 
courage them from making irresponsible guesses. 

4. 2. 3* 5 Expansion items test the student's knowledge of the con- 
struction of sentences in the foreign language. The student may be 
asked to expand a sentence in a number of different ways. In the 
following example, he is being tested on his knowledge of the posi- 
tioning of adjectival modifiers in noun phrases: 



Instructions: 



In each of the following sentences insert 
the word you are given: 



Sentence No. .1: Les femmes dtaient arrivdes. 



Insert: 

Answer: 



jolies 

Les jolies femmes dtaient arrivdes. 



Sentence No. 2: Les dames causaient. 



Insert: 



distingudes. 

Les dames distingudes causaient. 



O 

ERIC 



Answer: 



-45- 



4, 2, 3» 6 Transformation items are particularly suited to evaluating 
•writing. The student is given instructions and a model; for example, 
he is given a sentence with a verb in the present tense to be changed 
to a past ten&c?T**^A^ sentences follows with the verb in the 

present tense which the student is to*^ahhtrge hc^jbhe past. Trans- 
formation items can be written to test more complex matters sucE"*^ ^ 



making active sentences passive, inserting subordinate clauses into 
simple sentences to make complex sentences, etc. It is always ad- 
visable to furnish a model item as part of the instructions. 

4, 2. 3* 7 In substitution items , the student is asked to replace a 
word, part of a word, or phrase with another: 

Instructions: Replace the Underlined words with others 

which -are equally correct. 

Sentence: Je me suis lav£ les mains . 

Possible Answer: Je me suis lav£ la figure . 

4,2. 3» 8 Dictation is a well-known device in foreign-language 



/ 

instruction. However, it is difficult to grade straight dictation. 
The teacher must decide how to sort out and weight each student f s 
individual errors,. One way to avoid many problems is to give the 
student a partially filled-in answer sheet. As the teacher reads the 
dictation, the student writes in only what has been omitted from the 
text he has before him. The blanks may become more numerous as the 
student progresses in his ability to take dictation. Dictation has 
traditionally played an important role in the teaching of French. 

It is most useful as a testing device, to test the student ? s ability 
to understand oral French and his knowledge of French orthography. 




-46- 



4. 2. 4 The Language Areas 

4. 2. 4 , 1 The third basic set of elements involved in the preparation 
of foreign- language tests, the language areas to be tested, is the 
least clearly defined and by far the largest of the three. Opinions 
about the relative importance of various language areas differ widely. 

4.2,5 Contrastive Analysis 

k. 2. 1 The student’s native language is carefully contrasted with 

the foreign language he is studying because it is believed that the 
language learner will make numerous predictable errors when he in- 
correctly transfers the habits he has acquired in his native language 
to the language he is studying. In other words, the learner tends 
to express himself in the foreign language in terms of the rules of 
the language he already knows. He will automatically assume that 
whatever is not clearly different in the foreign language is the 
same as in his own language. A thorough knowledge of the differences 
that exist between the two languages can therefore help the teacher 
to anticipate problems. 

4. 2, 5* 2 A foreign-language test could well consist very largely of 
items on the areas where the two languages differ. If the teacher 
knows, for example, that French grammatical gender is different from 
English gender, he can expect gender to be a problem for the English- 
speaking learner and may therefore devote a large proportion of his 
testing to gender in the early stages of instruction. 

4.2. 5* 3 Wherever the native language of the student has a feature 



which is lacking or simpler in the foreign language, the problem is 



